Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy

نویسندگان

چکیده

The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge Internet over past decade. Intelligent real-time analysis such a high volume data, particularly leveraging highly accurate deep learning (DL) models, often requires be processed as close sources (or Internet) minimize network processing latency. advent specialized, low-cost, power-efficient greatly facilitated DL inference tasks edge. However, limited research been done improve throughput (e.g., number inferences per second) by exploiting various system techniques. This study investigates techniques, batched inferencing, AI multi-tenancy, cluster accelerators, which can significantly enhance overall on with models for image classification tasks. In particular, multi-tenancy enables collective utilization devices’ resources (CPU, GPU) accelerators Edge Tensor Processing Units; EdgeTPUs). evaluation results show that inferencing more than 2.4× improvement equipped high-performance GPUs like Jetson Xavier NX. Moreover, approaches, e.g., concurrent model executions (CME) dynamic placements (DMP), (with GPUs) EdgeTPU further improved up 3× 10×, respectively. Furthermore, we present detailed hardware software factors change EdgeTPUs, thereby shedding light areas could achieve

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks on Multi-Core CPUs and GPUs

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as average number of output voxels computed per unit time....

متن کامل

Learning Deep Architectures for AI

Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-...

متن کامل

ParaDrop: Enabling Lightweight Multi-tenancy at the Network's Extreme Edge

We introduce, Paradrop, a specific edge computing platform that provides (modest) computing and storage resources at the “extreme” edge of the network allowing third-party developers to flexibly create new types of services. This extreme edge of the network is the WiFi Access Point (AP) or the wireless gateway through which all end-device traffic (personal devices, sensors, etc.) pass through. ...

متن کامل

Maximizing the Throughput of Cuckoo Hashing in Network Devices

Hash tables form a core component of networkdevices. Because of their large size, they are implemented usingboth fast on-chip SRAM and slow off-chip DRAM. However, thismakes their implementation particularly delicate, as a suboptimalchoice of the hashing scheme parameters may result in a higheraverage query time, and therefore in a lower throughput. Sincehash tables are ...

متن کامل

Deep Learning for Causal Inference

In this paper, we propose the use of deep learning techniques in econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1.For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Internet Technology

سال: 2023

ISSN: ['1533-5399', '1557-6051']

DOI: https://doi.org/10.1145/3546192